Parallel Processing Using the Silicon Graphics / Cray Origin 2000

نویسنده

  • Bruce E. Tucker
چکیده

The Origin 2000 is a high performance computing platform produced jointly by Silicon Graphics / Cray. This scalable shared memory processor (SSMP) may be configured with up to 128 processors in a single system image. The Origin is a scalable, cache coherent, non-uniform memory access (CC-NUMA), distributed shared memory (DSM) architecture based on a hypercube interconnection topology. Effectively exploiting the parallel processing capabilities of this computer, as well as other similar computers, is very much an emerging technology. The objective of this paper is to demonstrate how this parallelism may be employed on practical computing problems, using the C programming language. This paper is targeted towards software engineers who are moving from traditional single processor computers to a high performance parallel computing platform. Section 1 Background in Parallel Processing Parallel processing is a computing technique designed to improve the execution time of an application by distributing the computational work among a number of processing elements. This definition of parallel processing is sufficiently broad to encompass concurrent processing, distributed computing, as well as parallel computing with tightly coupled multiprocessors. This paper is designed to provide a general framework for how to employ multiple processing elements to work cooperatively on a computing task, using the Silicon Graphics / Cray Origin 2000 computer. Ideally, parallel processing applications and architectures would exhibit “linear speedup.” In other words, if the number of processors applied towards a computing problem were doubled, then the execution time should be cut in half. Unfortunately, only a small set of parallel computing applications can achieve linear speedup. Figure 1 illustrates one limitation of parallel processing for a fixed size calculation. Note four processors give a speedup of 2.3 over a single processing element. However, the structure of the problem limits speedup. Therefore, the use of additional processors will not improve execution time.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallelization of SCF calculations within Q-Chem ✩

We have incorporated MPI based parallelism with dynamic load balance into the Hartree–Fock and DFT modules of Q-Chem. A series of benchmark calculations consisting of both single point energy and gradient calculations were carried out to gauge the performance of the parallel modules. Calculations were carried out on two different parallel computers, namely a shared memory Silicon Graphics Origi...

متن کامل

The collective computing model

The parallel computing model presented in this paper, the Collective Computing model (CCM), is an improvement of the well-known Bulk Synchronous Parallel (BSP) model. The synchronicity imposed by the BSP model restricts the set of available algorithms and prevents the overlapping of computation and communication. Other models, like the LogP model, allow asynchronous computing and overlapping bu...

متن کامل

Parallel Implementation of Particle Swarm Optimization Variants Using Graphics Processing Unit Platform

There are different variants of Particle Swarm Optimization (PSO) algorithm such as Adaptive Particle Swarm Optimization (APSO) and Particle Swarm Optimization with an Aging Leader and Challengers (ALC-PSO). These algorithms improve the performance of PSO in terms of finding the best solution and accelerating the convergence speed. However, these algorithms are computationally intensive. The go...

متن کامل

Parallel Implementations of the Split-Step Fourier Method for Solving Nonlinear Schrödinger Systems

We present a parallel version of the well-known Split-Step Fourier method (SSF) for solving the Nonlinear Schrödinger equation, a mathematical model describing wave packet propagation in fiber optic lines. The algorithm is implemented under both distributed and shared memory programming paradigms on the Silicon Graphics/Cray Research Origin 200. The 1D Fast-Fourier Transform (FFT) is paralleliz...

متن کامل

A General Programming Model for Developing Scalable Ocean Circulation Applications

In this paper we describe our efforts in porting a global ocean model— the Miami Isopycnic Coordinate Ocean Model or MICOM—to clusters of symmetric multiprocessors (SMPs). This work extends our previous efforts in porting this same application to the massively parallel Cray T3D This research was supported by the Office of Naval Research under grant no. N00014-941-0846, by the National Science F...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998